10 May, 2021

Introduction

Covid-19 and goals for the project

Materials and methods

Datasets and workflow

  • Covid-19 data from the github repository of John Hopkins University
  • Demographic and social factors on a country basis from gapminder and worldbank
  • Latitude and longitude data for countries is from the “maps” package dataset

Materials and methods

Cleaning & Augmenting

Cleaning: Issues with dataset:

  • Timeseries data in very wide format
  • Country names were not consistent over data sources
  • Multiple files had to be combined
  • Recovered are reported inconsistently

Augmentation:

  • Calculate cases, deaths, and recoveries per 100K citizens
  • Additional augmentations with rolling means and new cases per day
  • For shiny app: Join latitude and longitude data to the country level Covid data.

Materials and methods

Data exploration and models

  • Initial exploratory data analysis (eda)
  • Modelling:
    • PCA analysis: any trends, clusters, outliers in cases via countries and dependent variables?
    • 2 linear regression models (\(y=\beta_0 + \beta_1 x_1+ \beta_2 x_2 + \epsilon\)):
      • \(x_1\) = Pop % above 65, \(x_2\) = Urban pop %. Grouped by income level
      • \(x_1\) = GDP, \(x_2\) = Pop density. Grouped by region
  • Exploring Covid “waves” and case fatality

Results - Exploratory Data Analysis

Results - Exploratory Data Analysis

Visualise correlation between income groups

Results - PCA

PCA done on continous socio-economic features

  • PC 1 explains differences in cases fairly well
  • Some countries are outliers in this projection

Results - Linear regression model

Deaths as Function of Population % above 65 years and Population % living in urban areas.

Results - Linear regression model

Significant slopes

  • population % above 65 years with high, upper middle and lower middle income
  • population % in urban areas with upper middle income.

Results - Identifying covid waves

What is a good criteria for a wave?

  • We found that a weekly increase 10% in deaths is a good identifier

Results - Identifying covid waves

Do Covid-19 waves appear to be synchronized in different countries?

Covid-19 overview Shiny App

Discussion

  • In general from eda and linear regression, data suggests that more developed countries are hit “harder” by Covid-19.
    • This could be due to less developed countries having a younger avg. age (higher general mortality rates), thus younger populations are less severely influenced by infection. Slide 9 figure we see that low income countries have smallest range in the percent of pop. above 65 yrs.
    • However, here we have not considered for example:
      • Data quality/under reporting between countries. i.e. higher income countries will have better ability and excess to testing, thus more cases and deaths will be reported.
  • We see that Covid “waves” are not synchronous between global regions.
    • Some regions had very visually distinct peaks of waves over time (Europe & Central Asia), while other were less distinct (Asia & Pacific)
    • Regionally Europe & Central Asia had the highest percentage of countries in waves at a certen time. At the highest peak almost 70% of the countries in the region were in a wave.

Conclusion:

We want to point out that by looking at at correlations we are not inferring causation. In addition our linear model is not dealing with collinearity of features.

Overall we found that

Moving forward __